{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G3MMAcssHTML"
      },
      "source": [
        "<link rel=\"stylesheet\" href=\"/site-assets/css/gemma.css\">\n",
        "<link rel=\"stylesheet\" href=\"https://fonts.googleapis.com/css2?family=Google+Symbols:opsz,wght,FILL,GRAD@20..48,100..700,0..1,-50..200\" />"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2025 Google LLC."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SDEExiAk4fLb"
      },
      "source": [
        "# Prompt with images and text using Gemma library"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4qxv4Sn9b8CE"
      },
      "source": [
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://ai.google.dev/gemma/docs/core/gemma_library\"><img src=\"https://ai.google.dev/static/site-assets/images/docs/notebook-site-button.png\" height=\"32\" width=\"32\" />View on ai.google.dev</a>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/core/gemma_library.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/google/generative-ai-docs/main/site/en/gemma/docs/core/gemma_library.ipynb\"><img src=\"https://ai.google.dev/images/cloud-icon.svg\" width=\"40\" />Open in Vertex AI</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/google/generative-ai-docs/blob/main/site/en/gemma/docs/core/gemma_library.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PXNm5_p_oxMF"
      },
      "source": [
        "Using images for prompting Gemma models opens up a whole new range of possibilies for understanding your world and solving problems with visual data. Starting with [Gemma 3](/gemma/docs/core) in 4B sizes and higher, you can use image data as part of your prompt to for richer context and to solve more complex tasks.\n",
        "\n",
        "This tutorial shows you how to prompt Gemma with images using the [Gemma library](https://gemma-llm.readthedocs.io/) for JAX. Gemma library is a Python package built as an extension of [JAX](https://github.com/jax-ml/jax), letting you use the performance advantages of the JAX framework with dramatically less code.\n",
        "\n",
        "Note: For more the up-to-data information this library, see the [Gemma library](https://gemma-llm.readthedocs.io/) documentation."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QQ6W7NzRe1VM"
      },
      "source": [
        "## Setup\n",
        "\n",
        "To complete this tutorial, you'll first need to complete the setup instructions at [Gemma setup](https://ai.google.dev/gemma/docs/setup). The Gemma setup instructions show you how to do the following:\n",
        "\n",
        "* Get access to Gemma on [kaggle.com](https://www.kaggle.com).\n",
        "* Select a Colab runtime with sufficient resources to run\n",
        "  the Gemma model.\n",
        "* Generate and configure a Kaggle username and API key.\n",
        "\n",
        "After you've completed the Gemma setup, move on to the next section, where you'll set environment variables for your Colab environment."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z9oy3QUmXtSd"
      },
      "source": [
        "### Install libraries\n",
        "\n",
        "Install the Gemma library."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UcGLzDeQ8NwN"
      },
      "outputs": [],
      "source": [
        "!pip install -q gemma"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_gN-IVRC3dQe"
      },
      "source": [
        "### Set environment variables\n",
        "\n",
        "Set environment variables for `KAGGLE_USERNAME` and `KAGGLE_KEY`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DrBoa_Urw9Vx"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from google.colab import userdata\n",
        "\n",
        "# Note: `userdata.get` is a Colab API. If you're not using Colab, set the env\n",
        "# vars as appropriate for your system.\n",
        "os.environ[\"KAGGLE_USERNAME\"] = userdata.get('KAGGLE_USERNAME')\n",
        "os.environ[\"KAGGLE_KEY\"] = userdata.get('KAGGLE_KEY')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nbFs22cT_k_Y"
      },
      "source": [
        "Set the JAX environment to use the full GPU memory space."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Pv5gGAAg_xtQ"
      },
      "outputs": [],
      "source": [
        "os.environ[\"XLA_PYTHON_CLIENT_MEM_FRACTION\"]=\"1.00\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "599765c72722"
      },
      "source": [
        "### Import packages\n",
        "\n",
        "Import the Gemma library and additional support libraries."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "f2fa267d75bc"
      },
      "outputs": [],
      "source": [
        "# Common imports\n",
        "import os\n",
        "import jax\n",
        "import jax.numpy as jnp\n",
        "import tensorflow_datasets as tfds\n",
        "\n",
        "# Gemma imports\n",
        "from gemma import gm"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZsxDCbLN555T"
      },
      "source": [
        "## Configure a model\n",
        "\n",
        "Select and configure a Gemma model for use, including a tokenizer, model architecture, and checkpoints. The Gemma libary supports all of Google's official releases of the model. You must use the `Gemma3Tokenizer` and a Gemma 3 or later model to be able to process images as part of your prompt.\n",
        "\n",
        "To configure the model, run the following code:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yygIK9DEIldp"
      },
      "outputs": [],
      "source": [
        "tokenizer = gm.text.Gemma3Tokenizer()\n",
        "\n",
        "model = gm.nn.Gemma3_4B()\n",
        "\n",
        "params = gm.ckpts.load_params(gm.ckpts.CheckpointPath.GEMMA3_4B_IT)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FOBW7piN5-sl"
      },
      "source": [
        "## Generate text with text\n",
        "\n",
        "Start by prompting with text. The Gemma library provides a Sampler function for simple prompting."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "aae5GHrdpj2_"
      },
      "outputs": [],
      "source": [
        "sampler = gm.text.Sampler(\n",
        "    model=model,\n",
        "    params=params,\n",
        "    tokenizer=tokenizer,\n",
        ")\n",
        "\n",
        "sampler.sample('Roses are red.', max_new_tokens=30)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qH0eFH_DvYwM"
      },
      "source": [
        "Change the prompt and change the maximum number of tokens to generate different output."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vVlCnY7Gvm7U"
      },
      "source": [
        "## Generate text with images\n",
        "\n",
        "Once you have a text prompt working, you can add images to your prompt. Make sure you have configure a Gemma 3 or later model that is 4B or higher, and configured the `Gemma3Tokenizer`."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "U_gKV_cEGAQ_"
      },
      "source": [
        "### Load an image\n",
        "\n",
        "Load an image from a data source or a local file. The following code shows how to load an image from a TensorFlow datasource:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VEyTnnNGvgGG"
      },
      "outputs": [],
      "source": [
        "ds = tfds.data_source('oxford_flowers102', split='train')\n",
        "image = ds[0]['image']\n",
        "\n",
        "# display the image\n",
        "image"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Jm6KW0pEI6gX"
      },
      "source": [
        "### Prepare prompt with image data\n",
        "\n",
        "When you prompt with image data, you include a specific tag `<start_of_image>`, to include the image with the text your prompt. You then encode the prompt with the image data using the `tokenizer` object to prepare to run it with the model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5eIwxSEWKI7u"
      },
      "outputs": [],
      "source": [
        "prompt = \"\"\"<start_of_turn>user\n",
        "Describe the contents of this image.\n",
        "\n",
        "<start_of_image>\n",
        "\n",
        "<end_of_turn>\n",
        "<start_of_turn>model\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VMj0wphLKzQ1"
      },
      "source": [
        "If you want to prompt with more than one image, you must include a `<start_of_image>` tag for each image included in your prompt."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cgwSiapfL09z"
      },
      "source": [
        "### Run the prompt with image data\n",
        "\n",
        "After you prepare your image data and the prompt with image tags, you can run the prompt and generate output. The following code shows how to use the `Sampler` function run the prompt:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "46tsdilpMIxb"
      },
      "outputs": [],
      "source": [
        "sampler = gm.text.Sampler(\n",
        "    model=model,\n",
        "    params=params,\n",
        "    tokenizer=tokenizer,\n",
        ")\n",
        "\n",
        "out = sampler.sample(prompt, images=image, max_new_tokens=500)\n",
        "print(out)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-HUzMDoVtZOw"
      },
      "source": [
        "Alternatively, you can use the `gm.text.ChatSampler()` function generate a response without requiring `<start_of_turn>` tags. For more details, see the [Gemma library for JAX](https://gemma-llm.readthedocs.io/) documentation."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z4kXWLwwMqOo"
      },
      "source": [
        "## Next steps\n",
        "\n",
        "The Gemma library provides much more additional functionality. See these additional resources for more information:\n",
        "\n",
        "*   [Gemma library sampling](https://gemma-llm.readthedocs.io/en/latest/colab_sampling.html)\n",
        "*   [Gemma library finetuning](https://gemma-llm.readthedocs.io/en/latest/colab_finetuning.html)\n",
        "\n",
        "The Gemma library for JAX provides additional functionality, including LoRA, Sharding, Quantization and more. For more details, see the [Gemma library](https://gemma-llm.readthedocs.io) documentation. If you have any feedback, or have issues using Gemma library, submit them through the repository [Issues](https://github.com/google-deepmind/gemma/issues) interface.\n"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "gemma_library.ipynb",
      "toc_visible": true
    },
    "google": {
      "image_path": "/site-assets/images/marketing/gemma.png",
      "keywords": [
        "examples",
        "gemma",
        "python",
        "quickstart",
        "text"
      ]
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
